11  Assignment 1 (Reading data)

Instructions

  1. You may talk to a friend, discuss the questions and potential directions for solving them. However, you need to write your own solutions and code separately, and not as a group activity.

  2. Do not write your name on the assignment.

  3. Write your code in the Code cells and your answer in the Markdown cells of the Jupyter notebook. Ensure that the solution is written neatly enough to understand and grade.

  4. Use Quarto to print the .ipynb file as HTML. You will need to open the command prompt, navigate to the directory containing the file, and use the command: quarto render filename.ipynb --to html. Submit the HTML file.

  5. The assignment is worth 100 points, and is due on 6th October 2022 at 11:59 pm.

12 I

USA’s GDP per capita from 1960 to 2021 is given by the tuple T in the code cell below. The values are arranged in ascending order of the year, i.e., the first value is for 1960, the second value is for 1961, and so on.

Code
T = (3007, 3067, 3244, 3375,3574, 3828, 4146, 4336, 4696, 5032,5234,5609,6094,6726,7226,7801,8592,9453,10565,11674,12575,13976,14434,15544,17121,18237,19071,20039,21417,22857,23889,24342,25419,26387,27695,28691,29968,31459,32854,34515,36330,37134,37998,39490,41725,44123,46302,48050,48570,47195,48651,50066,51784,53291,55124,56763,57867,59915,62805,65095,63028,69288)

12.1 I-1.

12.1.1 I-1(a)

Use list comprehension to produce a list of the gaps between consecutive entries in T, i.e, the increase in GDP per capita with respect to the previous year. The list with gaps should look like: [60, 177, …].

(6 points)

12.1.2 I-1(b)

Use the list comprehension developed in (a) to find the maximum gap size, i.e, the maximum increase in GDP per capita.

(2 points)

12.1.3 I-1(c)

Use the list comprehension developed in (a) to find the percentage of gaps higher than $1000.

(5 points)

12.2 I-2

12.2.1 I-2(a)

Create a dictionary D, where the key is the year, and value for the key is the increase in GDP per capita in that year with respect to the previous year, i.e., the gaps computed in part (1).

(6 points)

12.2.2 I-2(b)

Use the dictionary D to find the year when the GDP per capita increase from the previous year was the maximum.

(4 points)

12.2.3 I-2(c)

Use the dictionary D to find the years when the GDP per capita decreased with respect to the previous year.

(4 points)

13 II

13.1 II-1

Read the data on ted talks.

(2 points)

13.2 II-2

Find the number of talks in the dataset.

(2 points)

13.3 II-3

Find the headline, speaker and year_filmed of the talk with the highest number of views.

(5 points)

13.4 II-4

Do the majority of talks have less views than the average number of views for a talk? Justify your answer.

(4 points)

Hint: Print summary statistics for questions (4) and (5).

13.5 II-5

Do at least 25% of the talks have more views than the average number of views for a talk? Justify your answer.

(4 points)

13.6 II-6

13.6.1 II-6(a)

The last column of the dataset consists of votes obtained by the talk under different categories, such as Funny, Confusing, Fascinating, etc. For each category, create a new column in the dataset that contains the votes obtained by the tedtalk in that category. Print the first 5 rows of the updated dataset.

(20 points)

13.6.2 II-6(b)

With the data created in (a), find the headline of the talk that received the highest number of Confusing votes.

(5 points)

13.6.3 II-6(c)

With the data created in (a), find the headline and the year of the talk that received the highest percentage of votes in the Fascinating category.

\[\text{Percentage of } \textit{Fascinating} \text{ votes for a ted talk} = \frac{Number \ of \ votes \ in \ the \ category - \ Fascinating}{Total \ votes \ in \ all \ categories}\]

(10 points)

14 III

14.1 III-1

Download the data set “univ.txt”. Read it with python.

(2 points)

14.2 III-2

14.2.1 III-2(a)

Find summary statistics of the data. Based on the statistics, answer parts b-e.

(1 point)

14.2.2 III-2(b)

How many universities are there in the data set?

(2 points)

14.2.3 III-2(c)

Estimate the maximum Tuition and fees among universities that are in the bottom 25% when ranked by total tuition and fees.

(3 points)

14.2.4 III-2(d)

How many universities share the ranking of 220? (If s universities share the same rank, say r, then the next lower rank is r+s, and all the ranks in between r and r+s are dropped)

(5 points)

14.2.5 III-2(e)

Can you find the mean Tuition and fees for an undergrad student in the US from the summary statistics? Justify your answer.

(3 points)

14.3 III-3

Find the average Tuition and fees for an undergrad student in the US.

(5 points)